The Ehrenfeucht-Mycielski Sequence

نویسنده

  • Klaus Sutner
چکیده

We show that the Ehrenfeucht-Mycielski sequence U is strongly balanced in the following sense: for any finite word w of length k, the limiting frequency of w in U is 2. 1. The Ehrenfeucht-Mycielski Sequence In [2] Ehrenfeucht and Mycielski introduced an infinite binary word based on avoiding repetitions. More precisely, to construct the Ehrenfeucht-Mycielski (EM) sequence U , start with a single bit 0. Suppose the first n bits Un = u1u2 . . . un have already been chosen. Find the longest suffix v of Un that appears already in Un−1. Find the last occurrence of v in Un−1, and let b be the first bit following that occurrence of v. Lastly, set un+1 = b, the complement of b. It is understood that if there is no prior occurrence of any non-empty suffix the last bit in the sequence is flipped. The resulting sequence starts like so: 01001101011100010000111101100101001001110 see also sequence A038219 in [7]. Since the Ehrenfeucht-Mycielski sequence is defined to avoid repetitions, one might suspect that it contains all finite words as factors; in the reference the authors show that this is indeed the case. The language pref(U) of all prefixes of U fails to be regular. Hence it follows from the gap theorem in [1] that pref(U) cannot be context-free. On the other hand, it is clear that a linear bounded automaton can recognize pref(U), so this language is contextsensitive. Indeed, it follows from the results in section 2 that one can recognize prefixes of the Ehrenfeucht-Mycielski word in logarithmic space and quadratic time using KMP. Much better results can be achieved with a hash-based algorithm, see [6, 3]. The second reference shows that under the assumption of near-monotonicity, see 1.2, one can generate a bit of the sequence in amortized constant time. Moreover, only linear space is required to construct an initial segment of the sequence, so that a simple laptop computer suffices to generate the first billion bits of the sequence in less than an hour, see [3]. Storing the first billion bits in the obvious bit-packed format requires 125 million bytes, and there is little hope to decrease this amount of space using data compression: the very definition of the EM sequence foils standard algorithms. For example, the Lemple-Ziv-Welch based gzip algorithm produces a “compressed” file of size 159,410 bytes from the first million bits of the EM sequence. The Burrows-Wheeler type bzip2 algorithm even produces a file of size 165,362 bytes. Date: March 28, 2007.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Normality of the Ehrenfeucht-Mycielski Sequence

We study the binary Ehrenfeucht Mycielski sequence seeking a balance between the number of occurrences of different binary strings. There have been numerous attempts to prove the balance conjecture of the sequence, which roughly states that 1 and 0 occur equally often in it. Our contribution is twofold. First, we study weaker forms of the conjecture proved in the past and lay out detailed proof...

متن کامل

. L O ] 1 5 Fe b 19 92 CONSTRUCTING STRONGLY EQUIVALENT NONISOMORPHIC MODELS FOR UNSUPERSTABLE

We study how equivalent nonisomorphic models of unsuperstable theories can be. We measure the equivalence by Ehrenfeucht-Fraisse games. This paper continues [HS].

متن کامل

Constructing Strongly Equivalent Nonisomorphic Models for Unsuperstable Theories, Part A

We study how equivalent nonisomorphic models an unsuperstable theory can have. We measure the equivalence by Ehrenfeucht-Fraisse games. This paper continues the work started in [HT].

متن کامل

Constructing Strongly Equivalent Nonisomorphic Models for Unsuperstable Theories, Part B

We study how equivalent nonisomorphic models of unsuperstable theories can be. We measure the equivalence by Ehrenfeucht-Fraisse games. This paper continues [HS].

متن کامل

On the Ehrenfeucht-Mycielski sequence

We introduce the inverted prefix tries (a variation of suffix tries) as a convenient formalism for stating and proving properties of the EhrenfeuchtMycielski sequence ([3]). We also prove an upper bound on the position in the sequence by which all strings of a given length will have appeared; our bound is given by the Ackermann function, which, in light of experimental data, may be a gross over...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003